Goto

Collaborating Authors

 validation metric




Supplementary for: UCLID-Net: Single View Reconstruction in Object Space Anonymous Author(s) Affiliation Address email 1 Metrics 1

Neural Information Processing Systems

This section defines the metrics and loss functions used in the main paper. The Earth Mover's Distance (EMD) is a distance that can be used to compare point clouds as well: We use the F-Score as a validation metric on the ShapeNet dataset. We introduce shell-Intersection over Union (sIoU). We use the sIoU as a validation metric on the ShapeNet dataset. We here present some details of the architecture and training procedure for UCLID-Net.


Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Wang, Wei, Wu, Dong-Dong, Li, Ming, Zhang, Jingxiong, Niu, Gang, Sugiyama, Masashi

arXiv.org Artificial Intelligence

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, the problem settings and solutions of PU learning have different families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.


Rotationally Invariant Latent Distances for Uncertainty Estimation of Relaxed Energy Predictions by Graph Neural Network Potentials

Musielewicz, Joseph, Lan, Janice, Uyttendaele, Matt, Kitchin, John R.

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) have been shown to be astonishingly capable models for molecular property prediction, particularly as surrogates for expensive density functional theory calculations of relaxed energy for novel material discovery. However, one limitation of GNNs in this context is the lack of useful uncertainty prediction methods, as this is critical to the material discovery pipeline. In this work, we show that uncertainty quantification for relaxed energy calculations is more complex than uncertainty quantification for other kinds of molecular property prediction, due to the effect that structure optimizations have on the error distribution. We propose that distribution-free techniques are more useful tools for assessing calibration, recalibrating, and developing uncertainty prediction methods for GNNs performing relaxed energy calculations. We also develop a relaxed energy task for evaluating uncertainty methods for equivariant GNNs, based on distribution-free recalibration and using the Open Catalyst Project dataset. We benchmark a set of popular uncertainty prediction methods on this task, and show that latent distance methods, with our novel improvements, are the most well-calibrated and economical approach for relaxed energy calculations. Finally, we demonstrate that our latent space distance method produces results which align with our expectations on a clustering example, and on specific equation of state and adsorbate coverage examples from outside the training dataset.


Sea wave data reconstruction using micro-seismic measurements and machine learning methods

Iafolla, Lorenzo, Fiorenza, Emiliano, Chiappini, Massimo, Carmisciano, Cosmo, Iafolla, Valerio Antonio

arXiv.org Artificial Intelligence

Sea wave monitoring is key in many applications in oceanography such as the validation of weather and wave models. Conventional in situ solutions are based on moored buoys whose measurements are often recognized as a standard. However, being exposed to a harsh environment, they are not reliable, need frequent maintenance, and the datasets feature many gaps. To overcome the previous limitations, we propose a system including a buoy, a micro-seismic measuring station, and a machine learning algorithm. The working principle is based on measuring the micro-seismic signals generated by the sea waves. Thus, the machine learning algorithm will be trained to reconstruct the missing buoy data from the micro-seismic data. As the micro-seismic station can be installed indoor, it assures high reliability while the machine learning algorithm provides accurate reconstruction of the missing buoy data. In this work, we present the methods to process the data, develop and train the machine learning algorithm, and assess the reconstruction accuracy. As a case of study, we used experimental data collected in 2014 from the Northern Tyrrhenian Sea demonstrating that the data reconstruction can be done both for significant wave height and wave period. The proposed approach was inspired from Data Science, whose methods were the foundation for the new solutions presented in this work. For example, estimating the period of the sea waves, often not discussed in previous works, was relatively simple with machine learning. In conclusion, the experimental results demonstrated that the new system can overcome the reliability issues of the buoy keeping the same accuracy.


Deep Convolutional Neural Network for Plume Rise Measurements in Industrial Environments

Koushafar, Mohammad, Sohn, Gunho, Gordon, Mark

arXiv.org Artificial Intelligence

Estimating Plume Cloud (PC) height is essential for various applications, such as global climate models. Smokestack Plume Rise (PR) is the constant height at which the PC is carried downwind as its momentum dissipates and the PC and the ambient temperatures equalize. Although different parameterizations are used in most air-quality models to predict PR, they have yet to be verified thoroughly. This paper proposes a low-cost measurement technology to monitor smokestack PCs and make long-term, real-time measurements of PR. For this purpose, a two-stage method is developed based on Deep Convolutional Neural Networks (DCNNs). In the first stage, an improved Mask R-CNN, called Deep Plume Rise Network (DPRNet), is applied to recognize the PC. Here, image processing analyses and least squares, respectively, are used to detect PC boundaries and fit an asymptotic model into the boundaries centerline. The y-component coordinate of this model's critical point is considered PR. In the second stage, a geometric transformation phase converts image measurements into real-life ones. A wide range of images with different atmospheric conditions, including day, night, and cloudy/foggy, have been selected for the DPRNet training algorithm. Obtained results show that the proposed method outperforms widely-used networks in smoke border detection and recognition.


Exploring validation metrics for offline model-based optimisation

Beckham, Christopher, Piche, Alexandre, Vazquez, David, Pal, Christopher

arXiv.org Artificial Intelligence

In offline model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of desirability through an expensive but real-world scoring process. Offline MBO tries to approximate this expensive scoring function and use that to evaluate generated designs, however evaluation is non-exact because one approximation is being evaluated with another. Instead, we ask ourselves: if we did have the real world scoring function at hand, what cheap-to-compute validation metrics would correlate best with this? Since the real-world scoring function is available for simulated MBO datasets, insights obtained from this can be transferred over to real-world offline MBO tasks where the real-world scoring function is expensive to compute. To address this, we propose a conceptual evaluation framework that is amenable to measuring extrapolation, and apply this to conditional denoising diffusion models. Empirically, we find that two validation metrics -- agreement and Frechet distance -- correlate quite well with the ground truth. When there is high variability in conditional generation, feedback is required in the form of an approximated version of the real-world scoring function. Furthermore, we find that generating high-scoring samples may require heavily weighting the generative model in favour of sample quality, potentially at the cost of sample diversity.


MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels

Ploshchik, Ilya, Chatzimparmpas, Angelos, Kerren, Andreas

arXiv.org Artificial Intelligence

Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific problematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.


Do you really know the difference between Test and Validation Datasets?

#artificialintelligence

Many people don't really know the difference between test and validation. In Machine Learning these two words are often used improperly, but they indicate two very different things. Even literature sometimes reverses the meaning of these terms. When training a model the dataset is usually divided into a train set, a validation set and a test set but…why are the last two sets needed? Keep reading, you will find your answers.